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Abstract: A distributed denial-of-service (DDoS) attack is one 
in which a multitude of compromised systems attack a single 
target. The flood of incoming messages to the target system 
essentially forces it to shut down. Two machine learning 
algorithms are used (Adaboost and Random forest) to detect 
DDoS attacks. 
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I. Introduction 

Today, the number of attacks against large computer systems or 
networks is growing at a rapid speed. One of the major threats 
to cyber security is Distributed Denial-of-Service (DDoS) 
attack, in which the victim network element(s) are bombarded 
with high volume of fabricated attacking packets originated 
from a large number of Zombies. The aim of the attack is to 
overload the victim and render it incapable of performing 
normal transactions. To protect network servers, network 
routers and client hosts from becoming the handlers, Zombies 
and victims of distributed denial-of-service (DDoS) attacks 
machine learning approach can be adopted as a sure shot 
weapon to these attacks 

Distributed Denial-of-Service (DDoS) attack is the one in which 
the victim 4 s network elements are bombarded with high volume 
of fictitious attacking packets that originate from a large number 
of machines. A successful attack allows the attacker to gain 
access to the victim's machine, allowing stealing of sensitive 
internal data and possibly cause disruption and denial of service 
in some cases (Sonal R.Chakole, 2014 ). 

Out of the various categories of DDoS attacks such as flooding, 
software exploit, protocol based etc Distributed Denial of 
service attack is the most famous. In fact, DDoS attack uses 
series of Zombies to initiate a flood attack against an unsafe 
single site. DDoS attack is initiated in 2-phases(Dongqi Wang, 
2008) Recruiting phase and Action phase. 

In Recruiting phase attacker initiates the attack from the master 
computer and tries to find some slave (Zombies) computers to 
be involved in the attack. A small piece of software is installed 
on the Zombies to run the attacker commands. The Action 
phase continued through a command issued from the attacker 
resides on the master computer toward the Zombies computers 
to run their pieces of software. The mission of the piece of 
software is to send dummy traffic designated toward the victim. 
The result is a massive flood of packets that crashes the host or 
swamp down the entire network operations. Very few networks 


or hosts can effectively cope with such a scale of attacks 
today. Most of the handler and Zombie are completely 
unaware of the fact that they were being used for launching 
of a DDoS attack (Rao, 2015). 

"Learning is any process by which a system improves 
performance from experience" [Herbert Alexander Simon], 
Machine Learning is concerned with computer programs that 
can automatically adapt and customize themselves to 
individual users. Machine learning applications are computer 
software programs or packages that enable the extraction and 
identification of patterns from experience, this is categorized 
into four parties supervised learning, unsupervised learning, 
semi-supervised learning and reinforcement learning. In 
supervised learning the correct classes of the data are known. 
Unsupervised learning the correct classes of the training data 
are not known. 

Semi-supervised learning is a mixed of supervised and 
unsupervised learning. Reinforcement learning allows the 
machine or software agent to learn its behaviour based on 
feedback from the environment (Wikipedia, 2015) 

II. Material and Methodology 

A Denial of Service (DoS) attack is an attempt by the attacker 
to prevent the legitimate users of a service from using that 
service. DDoS is a type of DOS attack where multiple 
compromised systems, which are often infected, are used to 
target a single system causing a Denial of Service (DoS) 
attack. 



“ ► Data packet 

Figure 2.1 A DDoS structure 
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2.1 DDOS ATTACKS 

DDoS attacks can be divided into five major categories that 
efficiently describe its architectural structure and overall 
behaviour. The first category, labelled level of computerization, 
specifies the attacks degree of automatization. The second 
category, named attack network, addresses the communication 
between the resources used for the actual attack and the source 
of the instruction initiating the event (zombies). Oppressed 
vulnerabilities are the next category for classifying a DDoS 
attack and describe the actual attack mechanism. The category 
influence characterizes a DDoS attack based on the attacks 
impact. The final category, attack intensity dynamics, consider 
the size of the attack related to the aspect of time (Usman Tariq, 
2006). 
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Figure 2.2 A DDoS attack categorization 
2.1.2 DDOS DEFENSES 

The various DDoS defense mechanisms' characteristics can, as 
with the attacks, be structured into a similar scheme of 
categorizations. This scheme consists of four major categories, 
each subdivided into smaller fragments. 

The first categorization is submissive defence mechanism, 
which initiates after the attack is detected. The second major 
categorization is active defence mechanism. This category is 
similar to the aforementioned. However, the significant 
difference is that active defence mechanisms are implemented 
in order to rapidly mitigate the attack by various measures. 

Categorization by action is the next major characteristic that 
can be used to identify various DDoS defences. The 
characteristic defines the main purpose of the defence 
mechanism. The last major category is defence deployment 
position, which addresses the physical location of the defence 
mechanisms’ placement. These characteristics address defence 
mechanism implemented close to the source of the attack. 
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Figure 2.3 A DDoS defense mechanism 


2.2 MACHINE LEARNING 

A computer program is said to learn from experience E with 
respect to some class of tasks T and performance measure P, 
if its performance at tasks in T, as measured by P, improves 
with experience E. 
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Table 2.1 Supervised learning algorithm classification 

2.3.3 Hybrid Adaboost and Random Forests 

For the combination of AdaBoost and random forests 
(ABRF) technique used, we utilized the random forest as a 
weak learner to generate the prediction models with less error 
rate. Although AdaBoost works fast with simple weak 
learners, random forest is of interest in our real world data 
set, due to few research studies having employed this method 
to predict in the networking domain. Thirteen steps of the 
hybrid AdaBoost and random forests algorithm. 
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Input: S: training set, S=Xj(i=l,2,...,n), labels G Y 
K: Iterations number 

L.Leam (Random Forests algorithm as weak learner) 
f: number of input instance to be used at each of the tree 
B: number of generated trees in random forest 

1) Assign N sample (xi,yi),..,(x n ,yn); xi EX, v, €{-!,+!} 

2) Initialize the weights ofDi(i)=l/n, i=l, ...,n) 

3) for k=l,...,K 

4) empty - E with the distribution Dk 

5) for b=ltoB 

6) Sb = booststrapSample(S) 

7) Cb = BuildRandomTreeClassifters(SbJ) 

8) E=E U{C b } 

9) next b 

10) Get weak hypothesis h k :X{-l, +1} with its error: e k = ^ 1)^(7) 

'=** (**)*>■; 

11) Update distribution D k : D k+l (/) = ^ eXp ^ a k3kK ( x k )) 

12) next k 

13) Output : H (y) = wl 

Table 2.4 Hybrid AdaBoost and Random Forests 

This combination has advantages including increased 
performance and prediction ability of the models in some data 
sets. The results obtained that the combination has a low error 
rate. Error rate is the basic measurement method, which is used 
to investigate the weak and strong points of algorithms. 

III. Results and Tables 


The simulations were done using the Packet Tracer simulation 
and Wireshark 2.0.2. The results presented for each value are 
the average of 6 simulation runs and simulation parameters took 
the following values: 

The principal metric in tests is the percentage of detections, 
which is assessed in terms of misbehaviour threshold. 

In the research, network traffic simulation is represented on the 
figure 4.1 shown 



Figure 4.1, Normal traffic of the packets without anomalies 

Wireshark 10 Graphs: wireshark_pcapng_F85EDE47-57A7-44DD-8707-DAAD3FEFF258_20160419185954_a08532 
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Normal traffic of the packets 
Corrupted traffic of the packets. 


Figure 4.2 Wireshark traffic analysis of the packets. 


4.2 Data collected 

The following table represents the data obtained from IPRC 
East (Technical College) in Rwanda; in April 2016. 



T1 

T2 

T3 

T4 

T5 

T6 

Not corrupted 

95,84% 

95.35% 

87.2% 

86,55% 

88,5% 

95.59% 

Corrupted 

4.16% 

4.65% 

12.8% 

13.45% 

11.5% 

4.41% 


Table 4.1 data collected using wireshark 

After analyzing these data, the average of the data above is 
91.5% of the normal traffic (not corrupted packets) and 8.5% 
of the abnormal (corrupted packets), comparing these 
analyzed data with the report (Akamai, Q2 2015) that says 
92, 2% is normal traffic (not corrupted packets) and 7.7% is 
abnormal traffics (corrupted packets) from the other country 
where Africa is located and that have been detailed in the 
report. 

4.3 Adaboost Algorithm implementation 

In these experiments, the performance and effectiveness of 
the proposed algorithm is done with six data sets, and the 
output of the experiment shows that, as the traffic of not 
corrupted packet are increasing the corrupted packets are 
decreased, presented in Figure 4.3. 

Y : dependent values (corrupted) 

X: independent values (not corrupted) 

/?! =-1.11698 0 a =1.102062 



4.4 Random Forests implementation 

In these experiments, the performance and effectiveness of 
the proposed algorithm is done with six data sets, and the 
output of the experiment shows that, each traffic packets is 
independent and it is contains corrupted data and not 
corrupted data , as shown in Figure 4.4. 


IJSET@2016 


doi : 10.17950/ijset/v5s9/903 


Page 435 




International Journal of Scientific Engineering and Technology 
Volume No. 5 Issue No. 9, pp: 433-437 


ISSN:2277-1581 
01 September 2016 


IJ5u 



Figure 4.4 Relationship between corrupted and not 
corrupted data using Random forest. 

4.5 Hybrid Adaboost and Random forest 

In these experiments, the performance and effectiveness of the 
proposed algorithm is compared with 10 single classifiers 
(Guandong Xu, 2008). 

Table I 

Performance o impi'ah* in ami >ng sivm cussihlr on tip- training and test sms 


Training Sot Test Set 


Classifiers 

Accuracy 

(%) 

Sensitivity 

(*) 

Specificity 

(%) 

Accuracy 

(%) 

Sensitivity 

r " 

Specificity 

(%) 

ABRF 

100.00 

10000 

100.00 

88.60 

89.30 

87.65 

AdaBoost 

80.88 

78.55 

85.28 

80.35 

77.93 

85.05 

ADTree 

85.09 

85.59 

84.39 

82.28 

83.59 

80.50 

Bagging 

9123 

92.24 

89.92 

83.86 

84.64 

82.77 

C4.5 

92.46 

93.19 

91.50 

84.04 

87.38 

80.08 

Conjunctive Rule 

77.54 

74.74 

83.71 

77.54 

74.74 

83.71 

Naive Bayes 

84.04 

85.54 

82.04 

83.51 

84.97 

81.56 

NN-classifier 

100.00 

100.00 

100.00 

83.86 

85.49 

81.71 

Random forests 

99.65 

99.69 

99.60 

85.79 

86.63 

84.65 

RIPPKR 

87.54 

91.15 

83.40 

85.79 

88.25 

82.75 

SVM 

99.82 

99.69 

100.00 

85.96 

86.45 

8529 


4.5.1Model Selection 

In these experiments, the performance of the proposed 
algorithm (ABRF) is compared with three classifiers including 
AdaBoost, random forests and C4.5 using the ROC curve. The 
experiment results were given in Figure 4.5. 



False Positive Rate 
Figure 4.5 ROC curve 


Figure 4.5 illustrates the predictive performance of four 
classifiers including AdaBoost, random forests, ABRF and 
C4.5. The results show that ABRF algorithm improves the 


prediction ability of random forests in some points and 
performs relatively well compared with AdaBoost and C4.5 
in terms of ROC curve. However, it is hardly possible to 
distinguish the difference in performance between ABRF and 
random forests models in ROC curve. Therefore, the advance 
techniques used to select these models such as AUC scores is 
needed. The experiment results were shown in Figure 4.6. 
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Figure 4.6 AUC scores 
IV. Conclusion 

In this paper we proposed a combination of the AdaBoost and 
random forests algorithms for constructing distributed denial 
of service prediction algorithm. We illustrated the capability 
and effectiveness of the hybrid machine learning algorithms 
(adaboost and random forest). Finally, a prediction using 
hybrid machine learning algorithms would be of interest. 
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